-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sid/datetime convertor functions #42
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsiddharthan I made some comments and change requests, based on what we dicussed today.
(defn ->year | ||
[localdate] | ||
(.getYear ^java.time.LocalDate localdate)) | ||
|
||
(defn ->year-month | ||
[localdate] | ||
(let [yyyy (.getYear ^java.time.LocalDate localdate) | ||
mmm (-> (.getMonth ^java.time.LocalDate localdate) | ||
(.getDisplayName ^java.time.format.TextStyle TextStyle/SHORT | ||
(^java.util.Locale Locale/getDefault)))] | ||
[yyyy mmm])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsiddharthan if we were going to support these types, in order to be consistent with the conversion api these conversion pathways would need to go through the convert-to
method, which supports universal conversion. i.e. the fn signatured would be (->year any-datetime)
, and ->year
would be a convenience method on top of convert-to
.
However, per our discussion today that we will try not supporting the types Year
, YearQuarter
, and YearMonth
these should be removed I think.
@@ -20,6 +22,11 @@ | |||
(defn year->local-date [^Year year] | |||
(-> year (.atMonthDay (java.time.MonthDay/parse "--01-01")))) | |||
|
|||
(defn year-quarter->local-date [year-quarter format-pattern] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsiddharthan let's make the format-pattern
optional, and by default support the string format that org.threeten.extra.YearQuarter/parse
accepts.
In terms of the location of this function, I think it can make sense to include it in this namespace, but maybe put it at the end with a comment explaining that it (and presumably its partner year-month->local-date
) are special helpers to help us work with these special types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving format-pattern as optional is simple to do, and can default to threeten conventions.
(defn year-quarter->local-date [year-quarter format-pattern] | ||
(-> year-quarter | ||
(YearQuarter/parse (DateTimeFormatter/ofPattern format-pattern)) | ||
.atEndOfQuarter)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that our convention should be instead to use the start of the quarter. Could be convinced otherwise, but that make more intuitive sense to me. If I think of the date representation of a quarter and a month, I wouldn't think of the end of the period...
Or do you have a principled reason for choosing the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to support both begin and end, but did not see the begin implementation. This is another reason why we should move to the 2nd option when doable - and we can remove dependencies on the threeten conventions.
I am thinking of refactoring so it fits with the approach you quoted above, but really treat it as a hack to move forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you are referring to the atDay method as that puzzled me and was not sure what that means as the period spans 3 months. I was looking for the equivalent of .atEndOfQuarter for the beginning. Obviously we can shift the previous quarter value by 1 day, but did not have appetite to do that. So, I remained at end of period as a easy convention.
Refactoring from convertors so these "time objects that are higher than date" data types are available in utils. In particular, move the bi-directional conversion methods for year-quarter (both string->local-date, and local-date->string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you are referring to the atDay method as that puzzled me and was not sure what that means as the period spans 3 months...
(.atDay some-year-quarter 1)
just gives you the first day of the each quarter. E.g.
(-> "2021-Q1"
org.threeten.extra.YearQuarter/parse
(.atDay 1))
;; => #time/date "2021-01-01"
(-> "2021-Q2"
org.threeten.extra.YearQuarter/parse
(.atDay 1))
;; => #time/date "2021-04-01"
;; etc...
Goal / Problem
We depend on dtype-next datatypes, including datetime representations for the data in datasets. As the datetime datatype uses integer values for datetimes, we have a gap in modeling higher order granularity in time, such as quarter, month, week, etc.
Proposed Solution
The approach considered here is to treat the higher order time objects as dates, by setting it to the last date of that time period.
So, "2020-Q1" is represented as the local date "2020-03-31", as that is the last-date of the quarter. Last date of the period is just a convention that we are adopting.
Similarly, it is easy to see that "2020-Mar" will be "2020-03-31". We leave it to the user to differentiate quarter vs month aspects of a date "2020-03-21".
The key advantage with this approach is that we can leverage all the features of a datetime datatype, including the ability to automatically index on such columns.
Work remaining
Open Questions
At some point, we need to look for alternatives for the above approach. One such possibility is to deconstruct year-quarter into a tuple of two integers for year and quarter, with each value as a column in the data set. The indexing will need a multiple column index solution.